Image Compressor

Efficient Image Compression using K-Means Clustering Algorithm

The objective of the project was to effectively compress an image using K-mean clustering Algorithm.The complete code of this project can be found in my github Page

K-means Clustering Algorithm

It’s an algorithm which takes multiple data points into consideration and finds similarities between these data points and it clusters them.The algorithm has to find patterns in these data points and has to come up with a classification.Its a classic example of an unsupervised learner

Working of the K-means Clustering Algorithm

To start of with the K-means clustering algorithm takes two random points and plots it onto the above data points and these points are known as cluster centroids

Once the algorithm decides the location of the cluster centroid it starts to scan each point in the data set and the points which are closest to the red coloured cluster centroid is shaded as red and similarly the points closest to the blue 158 cluster centroid is shaded as blue

Once this is done the algorithm looks at all the points which are red and then takes the avg of all these points and moves the cluster centroid to that position. The same is done to the blue cluster centroid

As observed earlier the k-means clustering algorithm has two functions: Function A-deciding the number of cluster centroids which have to be chosen Function B-Deciding the positing of the cluster centroid and also shade all the data points which are near to cluster centroid These two functions are repeated multiple times to get the right classification,

Methodology used to compresser an image

The image which has to be compressed is shown below

First, we import the necessary libraries:
- numpy for numerical operations
- sklearn.cluster.KMeans for performing k-means clustering
- matplotlib.pyplot for displaying the images
- PIL.Image for image manipulation
Next, we define the compress_image function that takes the path to an image file (image_path) and the desired number of clusters (k) as input.
Inside the function, we start by loading the image using Image.open(image_path) and converting it to a NumPy array using np.array(image). This allows us to perform operations on the image using NumPy.
We flatten the image array using image_array.reshape(-1, 3). This converts the 2D image array into a 1D array where each row represents a pixel in the image and each pixel has three color channels (RGB).
We then create an instance of the KMeans class with n_clusters=k and random_state=0. This sets up the k-means clustering algorithm with the desired number of clusters and a fixed random state for reproducibility.
Next, we use the fit_predict method of the KMeans object to perform clustering on the flattened image array (pixels). This assigns a cluster label to each pixel based on its color.
We obtain the cluster centers using kmeans.cluster_centers_. These represent the average color values for each cluster.
To compress the image, we replace the color values of each pixel with the color values of its corresponding cluster center. This is done by creating a new array, compressed_pixels, where each pixel is replaced with its cluster center color.
We reshape the compressed_pixels array back to the original image dimensions using compressed_pixels.reshape(image_array.shape).
We create a compressed image from the reshaped array using Image.fromarray(np.uint8(compressed_image_array)). The np.uint8 conversion is necessary to ensure the pixel values are in the valid range for image representation.
Next, we use matplotlib.pyplot to display the original and compressed images side by side. We create a figure with two subplots, where the first subplot shows the original image (image) and the second subplot shows the compressed image (compressed_image). We also set titles for each subplot and turn off the axis labels. Finally, we call plt.show() to display the figure.
Finally, we save the compressed image as "compressed_image.jpg" in the current directory using compressed_image.save("compressed_image.jpg").

Note on choosing the number of cluster centroids

In the code I had written to compress the image I had considered 15 cluster centroid points .As there are 3 primary colors,3 secondary colors and 6 tertiary colors

The choice of the number of cluster centroids (k) is a crucial decision in k-means clustering. It determines the level of compression and the quality of the compressed image. The selection of the optimal value for k depends on the specific requirements and trade-offs in terms of image quality and compression ratio.

Output

The below picture shows the compressed image

Future Scope of the Project

Image compression is widely used as a pre processing technique in computer vision and robotics for various applications. It enables efficient storage, transmission, and processing of visual data, reducing storage requirements and conserving bandwidth. Image compression is crucial for real-time video streaming, remote sensing, robot vision, embedded systems, medical imaging, and more. By compressing images, these fields benefit from improved performance, lower costs, and enhanced capabilities in resource-constrained environments.

References

S. Ashwini, S. K. S. Veni, and B. Kavitha. "Image Compression using K-means Clustering" (2015).
S. Saqib, K. Bashir, and K. Shahzad. "Image Compression Using K-means Clustering and Principle Component Analysis (PCA)" (2019).
K. Sundari and A. Annadurai. "An Image Compression Technique using K-means Clustering Algorithm" (2014).
M. Aravindh and N. R. Raajan. "Image Compression using K-means Clustering" (2016).
A. Kaur and S. Kaur. "Digital Image Compression using K-means Clustering and Huffman Coding" (2015).